895 research outputs found
Semantic Compression for Edge-Assisted Systems
A novel semantic approach to data selection and compression is presented for
the dynamic adaptation of IoT data processing and transmission within "wireless
islands", where a set of sensing devices (sensors) are interconnected through
one-hop wireless links to a computational resource via a local access point.
The core of the proposed technique is a cooperative framework where local
classifiers at the mobile nodes are dynamically crafted and updated based on
the current state of the observed system, the global processing objective and
the characteristics of the sensors and data streams. The edge processor plays a
key role by establishing a link between content and operations within the
distributed system. The local classifiers are designed to filter the data
streams and provide only the needed information to the global classifier at the
edge processor, thus minimizing bandwidth usage. However, the better the
accuracy of these local classifiers, the larger the energy necessary to run
them at the individual sensors. A formulation of the optimization problem for
the dynamic construction of the classifiers under bandwidth and energy
constraints is proposed and demonstrated on a synthetic example.Comment: Presented at the Information Theory and Applications Workshop (ITA),
February 17, 201
Distantly Labeling Data for Large Scale Cross-Document Coreference
Cross-document coreference, the problem of resolving entity mentions across
multi-document collections, is crucial to automated knowledge base construction
and data mining tasks. However, the scarcity of large labeled data sets has
hindered supervised machine learning research for this task. In this paper we
develop and demonstrate an approach based on ``distantly-labeling'' a data set
from which we can train a discriminative cross-document coreference model. In
particular we build a dataset of more than a million people mentions extracted
from 3.5 years of New York Times articles, leverage Wikipedia for distant
labeling with a generative model (and measure the reliability of such
labeling); then we train and evaluate a conditional random field coreference
model that has factors on cross-document entities as well as mention-pairs.
This coreference model obtains high accuracy in resolving mentions and entities
that are not present in the training data, indicating applicability to
non-Wikipedia data. Given the large amount of data, our work is also an
exercise demonstrating the scalability of our approach.Comment: 16 pages, submitted to ECML 201
Connotation Frames: A Data-Driven Investigation
Through a particular choice of a predicate (e.g., "x violated y"), a writer
can subtly connote a range of implied sentiments and presupposed facts about
the entities x and y: (1) writer's perspective: projecting x as an
"antagonist"and y as a "victim", (2) entities' perspective: y probably dislikes
x, (3) effect: something bad happened to y, (4) value: y is something valuable,
and (5) mental state: y is distressed by the event. We introduce connotation
frames as a representation formalism to organize these rich dimensions of
connotation using typed relations. First, we investigate the feasibility of
obtaining connotative labels through crowdsourcing experiments. We then present
models for predicting the connotation frames of verb predicates based on their
distributional word representations and the interplay between different types
of connotative relations. Empirical results confirm that connotation frames can
be induced from various data sources that reflect how people use language and
give rise to the connotative meanings. We conclude with analytical results that
show the potential use of connotation frames for analyzing subtle biases in
online news media.Comment: 11 pages, published in Proceedings of ACL 201
- …